NATURE VOL. 308 19 APRIL 1984 


751 


LETTERSTO NATURE 


Sequence and topology of a model 
intracellular membrane protein, 

El glycoprotein, from a corona virus 

John Armstrong, Heiner Niemann*, Sjef Smeekenstt, 
Peter Rottiert & Graham Warren 

European Molecular Biology Laboratory, Postfach 10.2209, 

6900 Heidelberg, FRG 

* Institut fur Virologie, Fachbereich Humanmedizin der Justus-Liebig 
Universitat, Giessen, FRG 

t Institute of Virology, Veterinary Faculty, State University of 
Utrecht, 3509 TD Utrecht, The Netherlands 


In the eukaryotic cell, both secreted and plasma membrane 
proteins are synthesized at the endoplasmic reticulum, then 
transported, via the Golgi complex, to the cell surface 1-4 . Each 
of the compartments of this transport pathway carries out par¬ 
ticular metabolic functions 5-8 , and therefore presumably con¬ 
tains a distinct complement of membrane proteins. Thus, 
mechanisms must exist for localizing such proteins to their 
respective destinations. However, a major obstacle to the study 
of such mechanisms is that the isolation and detailed analysis 
of such internal membrane proteins pose formidable technical 
problems. We have therefore used the El glycoprotein from 
coronavirus MHV-A59 as a viral model for this class of protein. 
Here we present the primary structure of the protein, deter¬ 
mined by analysis of cDNA clones prepared from viral mRNA. 
In combination with a previous study of its assembly into the 
endoplasmic reticulum membrane , the sequence reveals 
several unusual features of the protein which may be related 
to its intracellular localization. 

The eoronaviruses are a diverse class of enveloped RNA 
viruses of considerable medical and agricultural significance; 
they also provide a model for the study of persistent viral 
infections (see ref. 10 for review). In contrast to many enveloped 
viruses, the coronavirus mouse hepatitis virus (MHV) A59 buds 
inside the cell, into the lumen of the endoplasmic reticulum 11-14 . 
The assembled virion then appears to travel, via the Golgi 
complex, to the cell surface. Of the two viral membrane proteins, 
the smaller one, El, is necessary for formation of the envelope, 
and is restricted to internal cell membranes; apparently it only 
reaches the cell surface as part of the budded virion 1213 . Thus, 
the El glycoprotein is potentially a convenient model for study¬ 
ing those features of a membrane protein that determine its 
arrest at a particular destination on the membrane transport 
pathway. 

The mRNAs of MHV-A59 form a ‘nested set’: the seven 
RNAs share the 3' region of the positive-stranded genome, but 
extend to different lengths towards the 5' end 15-18 . From each 
RNA, only the 5' gene is translated 19,20 . In addition, a non¬ 
coding ‘leader’ sequence of approximately 70 bases, from the 
5' end of the genome, is common to the mRNAs 18,21,22 . The 
El gene is second from the 3' end and is therefore translated 
from the second smallest mRNA, RNA 6 (refs 19, 20). The 
sequence of the 3'-terminal gene, encoding the viral nucleocap- 
sid protein, has been determined previously 23,24 . 

Copy DNA clones spanning the El gene were prepared by 
two methods 23-25 and sequenced in the vectors M13mp8 (ref. 
26) or pEMBLS (ref. 27) by the chain-termination method 28 
(data available on request). A sequence of 780 nucleotides 
(Fig. 1), containing a single long open reading frame, precedes 
the coding region for the viral nucleocapsid protein. A leader 
of 76 nucleotides, almost identical to the leader of the smallest 
mRNA, RNA 7 (ref. 22), lies in front of the first potential 
initiator codon. Thus, the sequence in Fig. 1 represents the 5' 
end of RNA 6, encoding the El protein and starting at or near 
the extreme 5'-terminal nucleotide. 
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CCTATAAGAGTGATTGGCGTCCGTACGTACCCTCTCAACTCTAAAACTCTTGTAGTTTAA 
10 30 50 

MetSerSerThrThrGinAlaProGluProvalTyrGlnTrpTh 
ATCTAATCCAAACATTATGAGTAGTACTACTCAGGCCCCAGAGCCCGTCTATCAATGGAC 
70 90 110 


r AlaAspGluAlavalGlnpheLeuLysGluTr pAsnPheSer LeuGlyll«IleLeuLe 
GGCCGACGAGGCAGTTCAATTCCTTAAGGAATGGAACTTCTCGTTGGGCATTATACTACT 
130 150 170 

uphe HeThr IlelleLeuGl nPheGXyfyr ThrSer Arg Se r MetPhe HeTyrVal va 

ctttattactatgatactacagttcggttacacgagccgtagcatgtttatttatgttgt 

190 210 230 


ILysMetIlell«LeuTrpLeuMetTrpProLeuThrIleValLeuCysilePheAsnCy 

GAAAATGATAATCTTGTGGTTAATGTGGCCACTGACTATTGTTTTGTGTATTTTCAATTG 
250 270 290 

sVaXTyrAlaLeuAsnAsnValTyrLeuGlyPheSerIleValPheThrIleValSerII 

CGTGTATGCGCTAAATAATGTGTATCTTGGATTTTCTATAGTGTTTACTATAGTGTCCAT 
310 330 350 

eval ileTr pIleMetTyr pheva'l AsnSerileArg LeuPhe IleArgThrGlySer Tr 
TGTAATCTGGATTATGTATTTTGTTAATAGCATAAGGTTGTTTATCAGGACTGGTAGCTG 
370 390 410 

pTrpSer PheAsnProGluThrAsnAsnLeuMetCysileAspMetLysGlyThrValTy 
GTGGAGCTTCAACCCCGAAACAAACAACCTTATGTGTATAGATATGAAAGGTACCGTGTA 
430 450 470 

rValArgProIleileGluAspTyrHisThrLeuThrAlaThrIleileArgGlyHisLe 
TGTTAGACCCATTATTGAGGATTACCATACACTAACAGCCACTATTATTCGTGGCCACCT 
490 510 530 

uTyrMetGlnGlyValLysLeuGlyThrGlyPheSer LeuSerAspLeuproAlaTyrVa 
CTACATGCAAGGTGTTAAGCTAGGCACCGGTTTCTCTTTGTCTGACTTGCCCGCTTATGT 
550 570 590 

lThrVal AlaLysvalSer HisLeuCysThrTyr LysArgAlapheLeuAspLysVal As 
TACAGTTGCTAAGGTGTCACACCTTTGCACTTATAAGCGCGCATTCTTAGACAAGGTAGA 
610 630 650 

pGlyvalSerGlyPheAlaValTyrVaiLysSerLysValGlyAsnTyrArgLeuproSe 
CGGTGTTAGCGGTTTTGCTGTTTATGTGAAGTCCAAGGTCGGAAATTACCGACTGCCCTC 
670 690 710 

r Asn LysProSerGlyAlaAspThr AlaLeuLeuArg He* 

aaacaaaccgagtggcgcggacaccgcattgttgagaatctaatctaaactttaaggatg 

730 750 770 

Fig. 1 Sequence of the El cDNA and protein extending to the 
initiator codon of the adjacent nucleocapsid gene 23 . Proposed 
membrane-spanning regions are overlined. 

Two versions were found, in two different clones, for the 
sequence immediately upstream from the El initiator codon. 
The shorter one is shown in Fig. 1; in the second clone, an 
additional copy of the pentanucleotide ATCTA was found 
between nucleotides 65 and 66, making the sequence similar to 
that of the region adjacent to the nucleocapsid gene of another 
strain of MHV 29 . This difference could represent a mutation; 
alternatively, it may reflect heterogeneity in the normal mRNA 
population. Indirect support for the latter possibility comes from 
the observation that a RNase-T, oligonucleotide from this 
region of RNA 6, corresponding to the shorter sequence, was 
recovered in markedly lower yield than those from the rest of 
the molecule 30 . This site represents the point of fusion between 
the 5' leader sequence and the coding portion of the RNA. The 
fusion is thought to occur by ‘jumping’ of the viral RNA poly¬ 
merase to particular sites on its genome-length, negative- 
stranded template; the resumption of transcription then pro¬ 
duces each of the subgenomic mRNAs 22,31,32 . Thus, it seems 
possible that the polymerase may jump to more than one point 
on the template for each mRNA, generating variable numbers 
of the repeated pentanucleotide AUCUA in the resulting tran¬ 
script. 

Figure 1 shows the amino acid sequence encoded by the El 
gene. The predicted molecular weight of the protein is 26,000, 
slightly higher than that observed by gel electrophoresis 19 ' 33 but 
consistent with the unusual electrophoretic behaviour of this 33 , 
and other, hydrophobic proteins. Several features of the protein, 
when assembled into membranes in the virus 33 , or in vitro 9 , are 
reflected in the sequence. First, in contrast to the majority of 
membrane proteins, El is known to lack a cleaved ‘signal 
peptide’ 9 : the N-terminal region of the sequence contains no 
good candidate for a cleavage site 34 .. Second, the N-terminal 
region bears O-linked sugars 35,36 , which, uniquely among viral 
proteins so far studied, are the only known post-translational 
modification to El. Assuming that the terminal Met is 
removed 37 , the N-terminal sequence is Ser-Ser-Thr-Thr, which 
is identical to the O-glycosylated amino terminus of M-type 
glycophorin A (ref. 38). The O-linked sugars of El are them- 
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selves identical to those found in glycophorin 39 . Third, most of 
the protein is resistant to proteolysis when assembled in the 
membrane. Only 2.5 kilodaltons of polypeptide from the N- 
terminus are cleavable on the luminal side of the membrane (or 
outside the virion) and 1.5 kilodaltons from the C-terminus 
from the cytoplasmic (or intra-virion) side 9 , suggesting that the 
protein is largely buried in the membrane. In the sequence, a 
run of 22 uncharged residues from positions 26 to 47 represents 
a potential membrane-spanning region; residues 1-25 corres¬ 
pond to the portion removable by protease. A further sequence 
of uncharged residues, positions 57-106, is sufficiently long to 
cross the membrane twice more. If this region is divided in two, 
and each half plotted as an a-helical ‘wheel’, all the polar side 
chains of both sections cluster within 140°. Thus, a plausible 
conformation for this region is two hairpinned helices in the 
membrane, with adjacent polar faces (Fig. 2a). There are no 
other long hydrophobic sequences, implying that the region 
from residues 107 to —190 is either folded in the membrane to 
neutralize charges, or, more likely, is adjacent to the membrane 
but resistant to proteolysis. These features are summarized in 
Fig. 2 b. 

Which, if any, of these various features might be responsible 
for the protein’s intracellular localization? We do not know, for 
example, whether the protein has an active ‘signal’ causing its 
arrest on the transport pathway, or, alternatively, if it lacks a 
signal for onward transport; nor do we know whether a sorting 
process might operate on one or the other side of the membrane. 
The availability of a cDNA clone for the protein presents the 
opportunity to investigate these questions by allowing 
expression of the cloned DNA and in vitro mutagenesis. 

This approach has already been applied to two other viral 
glycoproteins, to investigate the importance of their cytoplasmic 
domains for transport to the cell surface, yielding opposite 
conclusions 40,41 . An intrinsic problem with the method, 
however, is the difficulty of distinguishing specific effects due to 
alterations at the site of mutagenesis, from a general structural 
disruption of the molecule. In this respect the El protein may 
be advantageous in that it provides the possibility of creating a 
more ‘active’ phenotype in the mutated molecule: specifically, 
particular alterations to the protein may result in its transport 
to the cell surface. 

We thank Willy Spaan for communicating results before publi¬ 
cation, G. Heisterberg-Moutsis (G. B. F. Braunschweig) for 
help with oligonucleotide synthesis, Ben van der Zeijst for 
discussion and Annie Steiner for preparing the manuscript. J. A. 
was supported by fellowships from the Royal Society and the 
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Fig. 2 a. Distribution of polar side 
chains in the hydrophobic regions of 
the El sequence. Residues 26-47 
(1), 57-81 (2) and 82-106 (3) are 
plotted as a-helices and viewed end- 
on. Polar side chains are boxed: pro¬ 
posed hydrophilic faces of helices 2 
and 3 are indicated, b, Possible 
topologies of the El protein across 
the membrane. Arrows indicate sites 
accessible to protease; broken 
arrows represent inefficient pro¬ 
teolysis 9 . 
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